Cooperation-eliciting prisoner's dilemma payoffs for reinforcement learning agents
نویسندگان
چکیده
This work considers a stateless Q-learning agent in iterated Prisoner’s Dilemma (PD). We have already given a condition of PD payoffs and Q-learning parameters that helps stateless Q-learning agents cooperate with each other [2]. That condition, however, has a restrictive premise. This work relaxes the premise and shows a new payoff condition for mutual cooperation. After that, we derive the payoff relations that will elicit mutual cooperation from the new condition.
منابع مشابه
Mood modelling within reinforcement learning
Simulating mood within a decision making process has been shown to allow cooperation to occur within the Prisoner’s Dilemma. In this paper we propose how to integrate a mood model into the classical reinforcement learning algorithm Sarsa, and show how this addition can allow self-interested agents to be successful within a multi agent environment. The human-inspired moody agent will learn to co...
متن کاملThe Speed of Learning in Noisy Games: Partial Reinforcement and the Sustainability of Cooperation
In an experiment, players’ ability to learn to cooperate in the repeated prisoner’s dilemma was substantially diminished when the payoffs were noisy, even though players could monitor one another’s past actions perfectly. In contrast, in one-time play against a succession of opponents, noisy payoffs increased cooperation, by slowing the rate at which cooperation decays. These observations are c...
متن کاملMultiagent Reinforcement Learning with Spiking and Non-Spiking Agents in the Iterated Prisoner's Dilemma
This paper investigates Multiagent Reinforcement Learning (MARL) in a general-sum game where the payoffs’ structure is such that the agents are required to exploit each other in a way that benefits all agents. The contradictory nature of these games makes their study in multiagent systems quite challenging. In particular, we investigate MARL with spiking and non-spiking agents in the Iterated P...
متن کاملMultiagent reinforcement learning in the Iterated Prisoner's Dilemma.
Reinforcement learning (RL) is based on the idea that the tendency to produce an action should be strengthened (reinforced) if it produces favorable results, and weakened if it produces unfavorable results. Q-learning is a recent RL algorithm that does not need a model of its environment and can be used on-line. Therefore, it is well suited for use in repeated games against an unknown opponent....
متن کاملBackward vs. Forward-Oriented Decision Making in the Iterated Prisoner's Dilemma: A Comparison Between Two Connectionist Models
We compare the performance of two connectionist models developed to model specific aspects of the decision making process in the Iterated Prisoner’s Dilemma Game. Both models are based on common recurrent network architecture. The first of them uses a backward-oriented reinforcement learning algorithm for learning to play the game while the second one makes its move decisions based on generated...
متن کامل